latent domain
Discover, Hallucinate,andAdapt: OpenCompound DomainAdaptationforSemanticSegmentation
Deep learning-based approaches have achieved great success in the semantic segmentation [24, 43, 2, 7, 42, 3, 17, 10], thanks to a large amount of fully annotated data. However, collecting large-scale accurate pixel-level annotations can be extremely time and cost consuming [6]. An appealing alternative is to use off-the-shelf simulators to render synthetic data for which groundtruth annotations are generated automatically [33, 34, 32]. Unfortunately, models trained purely on simulated data often fail to generalize to the real world due to thedomain shifts.
LearningtoAdaptviaLatentDomainsforAdaptive SemanticSegmentation
Semantic segmentation is a popular task in computer vision, which assigns pixel-wise semantic labels for given images. It has been widely utilized to facilitate downstream applications such as video surveillance and autonomous driving. Recent progress on image semantic segmentation has been drivenbydeep neural networks trained onalargeamount oflabeled data, which are yet expensive to obtain. An alternative way is to generate synthetic images with pixel-level ground truth readily available in an effortless way [1,2].
Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation
Unsupervised domain adaptation (UDA) for semantic segmentation has been attracting attention recently, as it could be beneficial for various label-scarce real-world scenarios (e.g., robot control, autonomous driving, medical imaging, etc.). Despite the significant progress in this field, current works mainly focus on a single-source single-target setting, which cannot handle more practical settings of multiple targets or even unseen targets. In this paper, we investigate open compound domain adaptation (OCDA), which deals with mixed and novel situations at the same time, for semantic segmentation. We present a novel framework based on three main design principles: discover, hallucinate, and adapt. The scheme first clusters compound target data based on style, discovering multiple latent domains (discover).
Realism Control One-step Diffusion for Real-World Image Super-Resolution
Wu, Zongliang, Zheng, Siming, Jiang, Peng-Tao, Yuan, Xin
Pre-trained diffusion models have shown great potential in real-world image super-resolution (Real-ISR) tasks by enabling high-resolution reconstructions. While one-step diffusion (OSD) methods significantly improve efficiency compared to traditional multi-step approaches, they still have limitations in balancing fidelity and realism across diverse scenarios. Since the OSDs for SR are usually trained or distilled by a single timestep, they lack flexible control mechanisms to adaptively prioritize these competing objectives, which are inherently manageable in multi-step methods through adjusting sampling steps. To address this challenge, we propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR. RCOD provides a latent domain grouping strategy that enables explicit control over fidelity-realism trade-offs during the noise prediction phase with minimal training paradigm modifications and original training data. A degradation-aware sampling strategy is also introduced to align distillation regularization with the grouping strategy and enhance the controlling of trade-offs. Moreover, a visual prompt injection module is used to replace conventional text prompts with degradation-aware visual tokens, enhancing both restoration accuracy and semantic consistency. Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency. Extensive experiments demonstrate that RCOD outperforms state-of-the-art OSD methods in both quantitative metrics and visual qualities, with flexible realism control capabilities in the inference stage.